in the Linux Journal article A Statistical Approach to the Spam Problem [Robinson]. This

نویسنده

  • Gary Robinson
چکیده

One of the many techniques which has recently been employed for filtering spam is one described in the Linux Journal article A Statistical Approach to the Spam Problem [Robinson]. This technique incorporates ideas from the seminal article A Plan For Spam [Graham] as well as R.A. Fisher's technique for combining p-values by means of the chi-square distribution [Hedges & Olkin]. The technique presented in here takes the chi-square-based approach a step further by taking into account two facts: a) there is redundancy in the token probabilities, and b) spam and ham emails have different amounts of such redundancy. Fivefold cross-validation was carried out on the new technique and is described here testing whether these factors actually lead to better performance. The results were positive and statistically significant.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Classification Method for E-mail Spam Using a Hybrid Approach for Feature Selection Optimization

Spam is an unwanted email that is harmful to communications around the world. Spam leads to a growing problem in a personal email, so it would be essential to detect it. Machine learning is very useful to solve this problem as it shows good results in order to learn all the requisite patterns for classification due to its adaptive existence. Nonetheless, in spam detection, there are a large num...

متن کامل

ارائه روشی مناسب برای دسته بندی نامه های الکترونیکی تبلیغاتی بر مبنای پروفایل کاربران

In general, Spam is related to satisfy or not satisfy the client and isn’t related to the content of the client’s email. According to this definition, problems arise in the field of marketing and advertising for example, it is possible that some of the advertising emails become spam for some users, and not spam for others. To deal with this problem, many researchers design an anti-s...

متن کامل

An Effective Model for SMS Spam Detection Using Content-based Features and Averaged Neural Network

In recent years, there has been considerable interest among people to use short message service (SMS) as one of the essential and straightforward communications services on mobile devices. The increased popularity of this service also increased the number of mobile devices attacks such as SMS spam messages. SMS spam messages constitute a real problem to mobile subscribers; this worries telecomm...

متن کامل

A New Hybrid Approach of K-Nearest Neighbors Algorithm with Particle Swarm Optimization for E-Mail Spam Detection

Emails are one of the fastest economic communications. Increasing email users has caused the increase of spam in recent years. As we know, spam not only damages user’s profits, time-consuming and bandwidth, but also has become as a risk to efficiency, reliability, and security of a network. Spam developers are always trying to find ways to escape the existing filters therefore new filters to de...

متن کامل

A Novel Hybrid Approach for Email Spam Detection based on Scatter Search Algorithm and K-Nearest Neighbors

Because cyberspace and Internet predominate in the life of users, in addition to business opportunities and time reductions, threats like information theft, penetration into systems, etc. are included in the field of hardware and software. Security is the top priority to prevent a cyber-attack that users should initially be detecting the type of attacks because virtual environments are not moni...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004